A Hybrid Approach to Semantic Hashtag Clustering in Social Media
نویسندگان
چکیده
The uncontrolled usage of hashtags in social media makes them vary a lot in the quality of semantics and the frequency of usage. Such variations pose a challenge to the current approaches which capitalize on either the lexical semantics of a hashtag by using metadata or the contextual semantics of a hashtag by using the texts associated with a hashtag. This thesis presents a hybrid approach to clustering hashtags based on their semantics, designed in two phases. The first phase is a sense-level metadata-based semantic clustering algorithm that has the ability to differentiate among distinct senses of a hashtag as opposed to the hashtag word itself. The gold standard test demonstrates that sense-level clusters are significantly more accurate than word-level clusters. The second phase is a hybrid semantic clustering algorithm using a consensus clustering approach which finds the consensus between metadata-based sense-level semantic clusters and text-based semantic clusters. The gold standard test shows that the hybrid algorithm outperforms both the text-based algorithm and the metadata-based algorithm for a majority of ground truths tested and that it never underperforms both baseline algorithms. In addition, a larger-scale performance study, conducted with a focus on disagreements in cluster assignments between algorithms, shows that the hybrid algorithm makes the correct cluster assignment in a majority of disagreement cases.
منابع مشابه
Hybrid semantic clustering of hashtags
Clustering hashtags based on their semantics is an important problem with many applications. The uncontrolled usage of hashtags in social media, however, makes the quality of semantics and the frequency of usage vary a lot, and this poses a challenge to the current approaches which capitalize on either the lexical semantics of a hashtag (by using metadata) or the contextual semantics of a hasht...
متن کاملAutomatic Hashtag Recommendation in Social Networking and Microblogging Platforms Using a Knowledge-Intensive Content-based Approach
In social networking/microblogging environments, #tag is often used for categorizing messages and marking their key points. Also, since some social networks such as twitter apply restrictions on the number of characters in messages, #tags can serve as a useful tool for helping users express their messages. In this paper, a new knowledge-intensive content-based #tag recommendation system is intr...
متن کاملSense-Level Semantic Clustering of Hashtags in Social Media
We enhance the accuracy of the currently available semantic hashtag clustering method, which leverages hashtag semantics extracted from dictionaries such as Wordnet and Wikipedia. While immune to the uncontrolled and often sparse usage of hashtags, the current method distinguishes hashtag semantics only at the word level. Unfortunately, a word can have multiple senses representing the exact sem...
متن کاملth Workshop on Making Sense of Microposts ( # Microposts 2015 ) Big things
Detecting events using social media such as Twitter has many useful applications in real-life situations. Many algorithms which all use different information sources—either textual, temporal, geographic or community features—have been developed to achieve this task. Semantic information is often added at the end of the event detection to classify events into semantic topics. But semantic inform...
متن کاملSemantics-driven Event Clustering in Twitter Feeds
Detecting events using social media such as Twitter has many useful applications in real-life situations. Many algorithms which all use di↵erent information sources—either textual, temporal, geographic or community features—have been developed to achieve this task. Semantic information is often added at the end of the event detection to classify events into semantic topics. But semantic informa...
متن کامل